Prostate cancer associated circulating nucleic acid biomarkers

ABSTRACT

The invention provides methods and reagents for diagnosing prostate cancer that are based on the detection of biomarkers in the circulating nucleic acids from a patient to be evaluated.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. application Ser. No.13/701,568, filed Feb. 19, 2013; which is a U.S. National Phase ofPCT/US2011/038780, filed Jun. 1, 2011; which claims benefit of U.S.provisional application no. 61/351,708, filed Jun. 4, 2010. Eachapplication is herein incorporated by reference.

REFERENCE TO SEQUENCE LISTING

This application includes a Sequence Listing as a text file named“SEQTXT_(—)83443-944440-001320US.txt” created Jul. 9, 2015 andcontaining 1,560,576 bytes. The material contained in this text file isincorporated by reference in its entirety for all purposes.

BACKGROUND OF THE INVENTION

Methods to detect prostate cancer, including PSA tests, are extremelyunreliable (see, e.g., Wever et al., J. Natl Cancer Inst 2010;102:352-355, 2010; Schröder et al., N. Engl. J. Med 360:1320-1328,2009). There is a need for effective detection methods. This inventionaddresses that need.

BRIEF SUMMARY OF THE INVENTION

The invention is based, in part, on the discovery of circulating nucleicacids (CNA) biomarkers associated with prostate cancer. In someembodiments, the CNA biomarkers are nucleic acid sequences, in thecurrent invention DNA sequences, that are present in the blood, e.g., ina serum or plasma sample, of a prostate cancer patient, but are rarelypresent, if at all, in the blood, e.g., a serum or plasma sample,obtained from a normal individual, i.e., in the context of thisinvention, an individual that does not have prostate cancer. In someembodiments, the CNA biomarkers are nucleic acid sequences, in thecurrent invention DNA sequences, i.e., DNA fragments, that are presentin the blood, e.g., in a serum or plasma sample, of a normal individual,but are rarely present, if at all, in the blood, e.g., a serum or plasmasample, obtained from a prostate cancer patient.

Accordingly, in one aspect, the invention provides a method of analyzingCNA in a sample (blood, serum or plasma) from a patient comprisingdetecting the presence of at least one cell-free DNA having a nucleotidesequence falling within a chromosomal region set forth in Tables 2-5 (orhaving a nucleotide sequence that is part of one of the sequences setforth in Tables A-D) in the sample. In some embodiments, detecting thepresence of, or the amount of, the at least one biomarker comprisesdetecting a cell-free DNA molecule having between 50 and 400 consecutivenucleotides of a unique sequence within a chromosomal region as setforth in Tables 2-5 (or of a unique sequence set forth in Tables A-D).

In one embodiment, a method of analyzing circulating free DNA in apatient sample is provided, comprising determining, in a sample that isblood, serum or plasma, the presence or absence, or the amount of, atleast 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 30, 40, or at least 50 cell-freeDNA molecules each having a sequence falling within a differentchromosomal region set forth in Table 2, 3, 4, or 5, and preferably thesequences of the cell-free DNA molecules are free of repetitive element.In preferred embodiments, the cell-free DNA molecules have sequencesfalling within different chromosomal regions in the same table selectedfrom Tables 2-5.

In another aspect, the present invention provides a kit including two ormore (e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40,or at least 50, but less than 115) sets of oligonucleotides. Each setcomprises one or more oligonucleotides with a nucleotide sequencefalling within one single chromosomal region that is set forth in Tables2-5. Preferably, different oligonucleotide sets correspond to differentchromosomal regions within the same table selected from Tables 2-5.Also, preferably the oligonucleotides are free of repetitive element.Optionally, the oligonucleotides are attached to one or more solidsubstrates such as microchips and beads.

In another aspect, the present invention provides a method of diagnosingor screening for prostate cancer in a patient. The method includes thesteps of: (a) determining, in a sample that is blood, serum or plasmafrom a patient, the presence or absence or the amount of, at least 2, 3,4, 5, 7, 8, 9, 10, 15, 20, 30, 40, or at least 50 cell-free DNAmolecules each having a sequence falling within a different chromosomalregion set forth in Table 2, 3, or 4; and (b) correlating the presenceof, or an increased amount of, said first and second cell-free DNAs withan increased likelihood that the patient has prostate cancer.Preferably, the sequences of the cell-free DNA molecules are free ofrepetitive element. In preferred embodiments, the cell-free DNAmolecules have sequences falling within different chromosomal regions inthe same table chosen from Tables 2-4.

In one aspect, the invention provides a method of identifying a patientthat has a CNA biomarker associated with prostate cancer, the methodcomprising detecting the presence of at least one biomarker set forth inTable 2, Table 3, or Table 4 in a CNA sample obtained from serum orplasma from the patient. A biomarker can be identified using any numberof methods, including sequencing of CNA as well as use of a probe orprobe set to detect the presence of the biomarker.

In some embodiments, the invention provides a method of identifying apatient that has a CNA biomarker that is associated with the absence ofprostate cancer, the method comprising detecting the presence of atleast one biomarker set forth in Table 5 in a CNA sample from serum orplasma from the patient. A biomarker can be identified using any numberof methods, including sequencing of CNA as well as use of a probe orprobe set to detect the presence of the biomarker.

In a further aspect, the invention provides a kit for identifying apatient that has a biomarker for prostate cancer and/or that has abiomarker associated with a normal individual that does not haveprostate cancer, wherein the kit comprises at least one polynucleotideprobe to a biomarker set forth in Table 2, 3, 4, or 5. Preferably, sucha kit comprises probes to multiple biomarkers, e.g., at least 2, 3, 4,5, 10, 20, 30, 40, 50, or more, of the biomarkers set forth in Tables2-5. In some embodiments, the kit also includes an electronic device orcomputer software to compare the hybridization patterns of the CNA inthe patient sample to a prostate cancer data set comprising a listing ofbiomarkers that are present in prostate cancer patient CNA, but not CNAsamples from normal individuals.

In some embodiments, the presence of the at least one biomarker in CNAis determined by sequencing. In some embodiments, the presence of the atleast one biomarker in CNA is determined using an array. In someembodiments, the presence of the at least one biomarker in CNA isdetermined using an assay that comprises an amplification reaction, suchas a polymerase chain reaction (PCR). In some embodiments, a nucleicacid array forming a probe set comprising probes to two or morechromosomal regions set forth in Tables 2-5 is employed. In someembodiments, a nucleic acid array forming a probe set comprising 2, 3,4, 5, 6, 7, 8, 9, 10 or more chromosomal regions, or all of thechromosomal regions, set forth in Table 2 is employed. In someembodiments, a nucleic acid array forming a probe set comprising 2, 3,4, 5, 6, 7, 8, 9, 10, or more chromosomal regions, or all of thechromosomal regions, set forth in Table 3 is employed. In someembodiments, a nucleic acid array forming a probe set comprising 2, 3,4, 5, 6, 7, 8, 9, 10 or more chromosomal regions, or all of thechromosomal regions, set forth in Table 4 is employed. In someembodiments, a nucleic acid array forming a probe set comprising two ormore chromosomal regions, or all of the chromosomal regions set forth inTable 5 is employed.

In an additional aspect, the invention provides a method of detectingprostate cancer in a patient that has, or is suspected of having,prostate cancer, the method comprising contacting DNA from the serum orplasma sample with a probe that selectively hybridizes to a sequencepresent on a chromosomal region described herein, e.g., a sequence setforth in Tables A-D under conditions in which the probe selectivelyhybridizes to the sequence; and detecting the presence or absence ofhybridization of the probe, wherein the presence of hybridization to asequence set forth in Table A, B, or C is indicative of prostate cancer.

The Tables of Sequences A, B, C, and D provide examples of sequencescorresponding to the chromosome regions set forth in Table 2, Table 3,Table 4, and Table 5, respectively. The designation (N)x in Tables A-Drefer to repetitive element sequences.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 provides an example of a ROC curve using the 42 highest rankingregions of the chromosomal regions identified in Tables 4 and 5,collectively. The actual ROC curve-build from score sums is giventogether with the 95% confidence limits.

DETAILED DESCRIPTION OF THE INVENTION

As used herein, a “biomarker” as used herein refers to a nucleic acidsequence that corresponds to a chromosomal region, where the presence ofthe nucleic acid in CNA is associated with prostate cancer.

In the current invention, a “chromosomal region” listed in any one ofTables 1 to 5 refers to the region of the chromosome that corresponds tothe nucleotide positions indicated in the tables. The nucleotidepositions on the chromosomes are numbered according to Homo sapiens(human) genome, build 37.1 as of June 2010 (Tables 1-3) and Homo sapiens(human) genome, build 36 as of March 2006 (Table 4, 5). As understood inthe art, there are naturally occurring polymorphisms in the genome ofindividuals. Thus, each chromosome region listed in the tablesencompasses allelic variants as well as the particular sequence in thedatabase, e.g., the sequences in Tables A-D corresponds to thechromosomal regions noted. An allelic variant typically has at least 95%identity, often at least 96%, at least 97%, at least 98%, or at least99% identity to the sequence of a chromosomal region noted in the Tablesthat is present in a particular database, e.g., the National Center forBiotechnology Information (Homo sapiens Build 37.1 at the websitehttp://followed by www.ncbi.nlm.nih.gov/mapview/.) Percent identity canbe determined using well known algorithms, including the BLASTalgorithm, e.g., set to the default parameters. Further, it isunderstood that the nucleotide sequences of the chromosomes may beimproved upon as errors in the current database are discovered andcorrected. The term “chromosomal region” encompasses any variant orcorrected version of the same region as defined in Tables 1-5. Given theinformation provided in the tables in the present disclosure, especiallyin view of the sequences listed in Tables A-D, a skilled person in theart will be able to understand the chromosomal regions used for thepresent invention even after new variants are discovered or errors arecorrected.

“Detecting the presence of a chromosomal region” in CNA in the contextof this invention refers to detecting any sequence from a chromosomalregion shown in Table 2, 3, 4, or 5, where the sequence detected can beassigned unambiguously to that chromosomal region. Thus, this termrefers to the detection of unique sequences from the chromosomalregions. Methods of removing repetitive sequences from the analysis areknown in the art and include use of blocking DNA, e.g., when the targetnucleic acids are identified by hybridization. In some embodiments,typically where the presence of a prostate cancer biomarker isdetermined by sequencing the CNA from a patient, well known computerprograms and manipulations can be used to remove repetitive sequencesfrom the analysis (see, e.g., the EXAMPLES section). In addition,sequences that have multiple equally fitting alignment to the referencedatabase are typically omitted from further analyses.

The term “detecting a biomarker” as used herein refers to detecting asequence from a chromosomal region listed in Table 2, Table 3, Table 4,or Table 5. A biomarker is considered to be present if any nucleic acidsequence present in the CNA is unambiguously assigned to the chromosomalregion.

The term “unambiguously assigned” in the context of this inventionrefers to determining that a DNA detected in the CNA of a patient isfrom a particular chromosomal region. Thus, in detection methods thatemploy hybridization, the probe hybridizes specifically to that region.In detection methods that employ amplification, the primer(s) hybridizesspecifically to that region. In detection methods that employsequencing, the sequence is assigned to that region based on well-knownalgorithms for identity, such as the BLAST algorithm using highstringent parameters, such as e<0.0001. In addition, such a sequencedoes not have a further equally fitting hit on the used database.

The term “circulating nucleic acids” refers to acellular nucleic acidsthat are present in the blood.

The term “circulating cell-free DNA” as used herein means free DNAmolecules of 25 nucleotides or longer that are not contained within anyintact cells in human blood, and can be obtained from human serum orplasma.

The term “hybridization” refers to the formation of a duplex structureby two single stranded nucleic acids due to complementary base pairing.Hybridization can occur between exactly complementary nucleic acidstrands or between nucleic acid strands that contain minor regions ofmismatch. As used herein, the term “substantially complementary” refersto sequences that are complementary except for minor regions ofmismatch. Typically, the total number of mismatched nucleotides over ahybridizing region is not more than 3 nucleotides for sequences about 15nucleotides in length. Conditions under which only exactly complementarynucleic acid strands will hybridize are referred to as “stringent” or“sequence-specific” hybridization conditions. Stable duplexes ofsubstantially complementary nucleic acids can be achieved under lessstringent hybridization conditions. Those skilled in the art of nucleicacid technology can determine duplex stability empirically considering anumber of variables including, for example, the length and base pairconcentration of the oligonucleotides, ionic strength, and incidence ofmismatched base pairs. For example, computer software for calculatingduplex stability is commercially available from National Biosciences,Inc. (Plymouth, Minn.); e.g., OLIGO version 5, or from DNA Software (AnnArbor, Mich.), e.g., Visual OMP 6.

Stringent, sequence-specific hybridization conditions, under which anoligonucleotide will hybridize only to the target sequence, are wellknown in the art (see, e.g., the general references provided in thesection on detecting polymorphisms in nucleic acid sequences). Stringentconditions are sequence-dependent and will be different in differentcircumstances. Generally, stringent conditions are selected to be about5° C. lower to 5° C. higher than the thermal melting point (Tm) for thespecific sequence at a defined ionic strength and pH. The Tm is thetemperature (under defined ionic strength and pH) at which 50% of theduplex strands have dissociated. Relaxing the stringency of thehybridizing conditions will allow sequence mismatches to be tolerated;the degree of mismatch tolerated can be controlled by suitableadjustment of the hybridization conditions.

The term “primer” refers to an oligonucleotide that acts as a point ofinitiation of DNA synthesis under conditions in which synthesis of aprimer extension product complementary to a nucleic acid strand isinduced, i.e., in the presence of four different nucleosidetriphosphates and an agent for polymerization (i.e., DNA polymerase orreverse transcriptase) in an appropriate buffer and at a suitabletemperature. A primer is preferably a single-strandedoligodeoxyribonucleotide. The primer includes a “hybridizing region”exactly or substantially complementary to the target sequence,preferably about 15 to about 35 nucleotides in length. A primeroligonucleotide can either consist entirely of the hybridizing region orcan contain additional features which allow for the detection,immobilization, or manipulation of the amplified product, but which donot alter the ability of the primer to serve as a starting reagent forDNA synthesis. For example, a nucleic acid sequence tail can be includedat the 5′ end of the primer that hybridizes to a captureoligonucleotide.

The term “probe” refers to an oligonucleotide that selectivelyhybridizes to a target nucleic acid under suitable conditions. A probefor detection of the biomarker sequences described herein can be anylength, e.g., from 15-500 by in length. Typically, in probe-basedassays, hybridization probes that are less than 50 bp are preferred.

The term “target sequence” or “target region” refers to a region of anucleic acid that is to be analyzed and comprises the sequence ofinterest.

As used herein, the terms “nucleic acid,” “polynucleotide” and“oligonucleotide” refer to primers, probes, and oligomer fragments. Theterms are not limited by length and are generic to linear polymers ofpolydeoxyribonucleotides (containing 2-deoxy-D-ribose),polyribonucleotides (containing D-ribose), and any other N-glycoside ofa purine or pyrimidine base, or modified purine or pyrimidine bases.These terms include double- and single-stranded DNA, as well as double-and single-stranded RNA. Oligonucleotides for use in the invention maybe used as primers and/or probes.

A nucleic acid, polynucleotide or oligonucleotide can comprisephosphodiester linkages or modified linkages including, but not limitedto phosphotriester, phosphoramidate, siloxane, carbonate,carboxymethylester, acetamidate, carbamate, thioether, bridgedphosphoramidate, bridged methylene phosphonate, phosphorothioate,methylphosphonate, phosphorodithioate, bridged phosphorothioate orsulfone linkages, and combinations of such linkages.

A nucleic acid, polynucleotide or oligonucleotide can comprise the fivebiologically occurring bases (adenine, guanine, thymine, cytosine anduracil) and/or bases other than the five biologically occurring bases.These bases may serve a number of purposes, e.g., to stabilize ordestabilize hybridization; to promote or inhibit probe degradation; oras attachment points for detectable moieties or quencher moieties. Forexample, a polynucleotide of the invention can contain one or moremodified, non-standard, or derivatized base moieties, including, but notlimited to, N6-methyl-adenine, N6-tert-butyl-benzyl-adenine, imidazole,substituted imidazoles, 5-fluorouracil, 5 bromouracil, 5-chlorouracil,5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine, 5(carboxyhydroxymethyl)uracil, 5 carboxymethylaminomethyl-2-thiouridine,5 carboxymethylaminomethyluracil, dihydrouracil,beta-D-galactosylqueosine, inosine, N6 isopentenyladenine,1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine,2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-methyladenine,7-methylguanine, 5-methylaminomethyluracil,5-methoxyaminomethyl-2-thiouracil, beta-D mannosylqueosine,5′-methoxycarboxymethyluracil, 5-methoxyuracil,2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v),wybutoxosine, pseudouracil, queosine, 2 thiocytosine,5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,uracil-5-oxyacetic acidmethylester, 3-(3-amino-3-N-2-carboxypropyl)uracil, (acp3)w, 2,6-diaminopurine, and 5-propynyl pyrimidine. Otherexamples of modified, non-standard, or derivatized base moieties may befound in U.S. Pat. Nos. 6,001,611; 5,955,589; 5,844,106; 5,789,562;5,750,343; 5,728,525; and 5,679,785, each of which is incorporatedherein by reference in its entirety. Furthermore, a nucleic acid,polynucleotide or oligonucleotide can comprise one or more modifiedsugar moieties including, but not limited to, arabinose,2-fluoroarabinose, xylulose, and a hexose.

The term “repetitive element” as used herein refers to a stretch of DNAsequence of at least 25 nucleotides in length that is present in thehuman genome in at least 50 copies.

The terms “arrays,” “microarrays,” and “DNA chips” are used hereininterchangeably to refer to an array of distinct polynucleotides affixedto a substrate, such as glass, plastic, paper, nylon or other type ofmembrane, filter, chip, bead, or any other suitable solid support. Thepolynucleotides can be synthesized directly on the substrate, orsynthesized separate from the substrate and then affixed to thesubstrate. The arrays are prepared using known methods.

Introduction

The invention is based, at least in part, on the identification CNAsequences from particular chromosomal regions that are present or at anincreased amount in the blood of patients that have prostate cancer, butare rarely, if ever, present, or at a lower amount, in the blood ofnormal patients that do not have prostate cancer. The invention is alsobased, in part, on the identification of biomarkers in the CNA in normalindividuals, i.e., in the context of this invention, individuals notdiagnosed with prostate cancer, that are rarely, if ever, present inpatients with prostate cancer. Thus, the invention provides methods anddevices for analyzing for the presence of sequences from a chromosomalregion corresponding to at least one of the chromosomal regions setforth in Table 2, Table, 3, Table 4, or Table 5.

Accordingly, in one aspect, the invention provides a method of analyzingCNA in a sample (blood, serum or plasma) from a patient comprisingdetecting the presence of, or an amount of, at least one circulatingcell-free DNA having a nucleotide sequence of at least 25 nucleotidesfalling within a chromosomal region set forth in Table 2, 3, 4, o5 (orhaving a nucleotide sequence that is a part of one of the sequences setforth in Table A, B, C, or D). Preferably, the circulating cell-free DNAis free of repetitive element. In one embodiment, the patient is anindividual suspected of or diagnosed with cancer, e.g., prostate cancer.

By “falling within” it is meant herein that the nucleotide sequence of acirculating cell-free DNA is substantially identical (e.g., greater than95% identical) to a part of the nucleotide sequence of a chromosomeregion. In other words, the circulating cell-free DNA can hybridize tounder stringent conditions, or be derived from, the chromosomal region.

In one embodiment, a method of analyzing circulating cell-free DNA in apatient sample is provided, comprising determining, in a sample that isblood, serum or plasma, the presence or the amount of, a plurality ofcirculating cell-free DNA molecules each having a sequence of at least25 nucleotides in length falling within the same one single chromosomalregion set forth in Table 2, Table 3, Table 4, or Table 5 (or eachhaving a nucleotide sequence of at least 25 consecutive nucleotides inlength that is a part of the same sequence set forth in Table A, TableB, Table C, or Table D). There may be two or more or any number ofdifferent circulating cell-free DNA molecules that are all derived fromthe same one chromosomal region set forth in Table 2, 3, 4, or 5, and insome embodiments, all such circulating cell-free DNA molecules aredetected and/or the amounts thereof are determined.

Preferably the sequences of the circulating cell-free DNA molecules arefree of repetitive elements.

In one embodiment, a method of analyzing circulating cell-free DNA in apatient sample is provided, comprising determining, in a sample that isblood, serum or plasma, the presence or absence or the amount of, atleast 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 30, 40, or at least 50circulating cell-free DNA molecules each having a sequence of at least25 base pairs falling within a different chromosomal region set forth inTable 2, Table 3, Table 4, or Table 5 (or having a nucleotide sequenceof at least 25, 40, 50, 60, 75 or 100 consecutive nucleotides in lengththat is a part of one of the sequences set forth in Table A, Table B,Table C, or Table D). Preferably the sequences of the circulatingcell-free DNA molecules are free of repetitive elements. In preferredembodiments, the cell free DNA molecules have sequences falling withindifferent chromosomal regions in the same table that is chosen fromTables 2-5. In one specific embodiment, the presence or absence or theamounts of, at least 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, or at least 25, orof 30, 31, or 32, circulating cell-free DNA molecules are determined,the sequence of each falling within a different chromosomal region setforth in Table 2 (or having a nucleotide sequence of at least 25, 40,50, 60, 75 or 100 consecutive nucleotides in length that is a part ofone of the sequences set forth in Table A). In another specificembodiment, the presence or absence or the amounts of, at least 2, 3, 4,5, 7, 8, 9, 10, or at least 15, or of 20, 21, or 22, circulatingcell-free DNA molecules are determined, the sequence of each fallingwithin a different chromosomal region set forth in Table 3 (or having anucleotide sequence of at least 25, 40, 50, 60, 75 or 100 consecutivenucleotides in length that is a part of one of the sequences set forthin Table B). In one specific embodiment, the presence or absence or theamounts of, at least 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 25, 30, 35, 50,45, or 50, or of 52, circulating cell-free DNA molecules are determined,the sequence of each falling within a different chromosomal region setforth in Table 4 (or having a nucleotide sequence of at least 25, 40,50, 60, 75 or 100 consecutive nucleotides in length that is a part ofone of the sequences set forth in Table C). In yet another specificembodiment, the presence or absence or the amounts of, at 1, 2, 3, or 4circulating cell-free DNA molecules are determined each having asequence falling within a different chromosomal region set forth inTable 5 (or having a nucleotide sequence of at least 25, 40, 50, 60, 75or 100 consecutive nucleotides in length that is a part of one of thesequences set forth in Table D).

In a specific embodiment, the method of analyzing circulating cell-freeDNA includes the steps of: isolating, from blood, serum or plasma sampleof a patient, substantially all circulating cell-free DNA moleculeshaving a length of at least 20, 25, 30, 40, 50, 75 or 100 consecutivenucleotides in length, or between 50 and 400 nucleotides in length,obtaining the sequence of each of the circulating cell-free DNAmolecules, and comparing the sequence to one or more of the sequencesset forth in Tables A-D to determine whether the sequence falls within achromosomal region set forth in Table 2, 3, 4, or 5.

In another specific embodiment, the method of analyzing circulatingcell-free DNA includes the steps of: isolating, from blood, serum orplasma sample of a patient, substantially all circulating cell-free DNAmolecules having a length of at least 20, 25, 30, 40, 50, 75 or 100consecutive nucleotides in length, or between 50 and 400 nucleotides inlength, and contacting the circulating cell-free DNA molecules to aplurality of oligonucleotides (e.g., on a DNA chip or microarray) todetermine if one of the circulating cell-free DNA molecules hybridizesto any one of the plurality of oligonucleotide probes under stringentconditions. Each of the oligonucleotide probes has a nucleotide sequenceidentical to a part of the sequence of a chromosomal region chosen fromTables 2-5 (or a sequence set forth in Tables A-D). Thus, if acirculating DNA molecule hybridizes under stringent conditions to one ofthe oligonucleotide probes, it indicates that the circulating DNAmolecule has a nucleotide sequence falling within a chromosomal regionset forth in Table 2, Table 3, Table 4, or Table 5.

In the above various embodiments, preferably the circulating cell-freeDNA molecules have at least 25 consecutive nucleotides in length(preferably at least 50, 70, 80, 100, 120 or 200 consecutive nucleotidesin length). More preferably, the circulating cell-free DNA moleculeshave between about 50 and about 300 or 400, preferably from about 75 andabout 300 or 400, more preferably from about 100 to about 200consecutive nucleotides of a unique sequence within a chromosomal regionas set forth in Table 2, Table 3, Table 4, or Table 5 (or of a uniquesequence set forth in Table A, Table B, Table C, or Table D).

In another aspect, the present invention provides a method of diagnosingor screening for prostate cancer in a patient. The method includes thesteps of: (a) determining, in a sample that is blood, serum or plasmafrom a patient, the presence or absence or the amount of, at least 1, 2,3, 4, 5, 7, 8, 9, 10, 15, 20, 30, 40, or at least 50 circulatingcell-free DNA molecules each having a sequence of at least 25nucleotides in length falling within a different chromosomal region setforth in Table 2, Table 3, or Table 4 (or having a nucleotide sequenceof at least 25 consecutive nucleotides in length that is a part of oneof the sequences set forth in Table A, Table B, or Table C); and (b)correlating the presence of or an increased amount of the circulatingcell-free DNAs with an increased likelihood that the patient hasprostate cancer.

Alternatively, the method of invention includes the steps of: (a)determining, in a sample that is blood, serum or plasma from a patient,the presence or absence or the amount of 1, 2, 3, or 4 circulatingcell-free DNA molecules each having a sequence of at least 25nucleotides in length falling within a different chromosomal region setforth in Table 5 (or having a nucleotide sequence of at least 25consecutive nucleotides in length that is a part of one of the sequencesset forth in Table D); and (b) correlating the presence of or anincreased amount of the circulating cell-free DNAs with a decreasedlikelihood that the patient has prostate cancer.

When the steps of the above methods are applied to a patient diagnosedof cancer, the patient may be monitored for the status of prostatecancer, or for determining the treatment effect of a particulartreatment regimen, or detecting cancer recurrence or relapse.

In the diagnosis/monitoring method of the present invention, preferablythe sequences of the circulating cell-free DNA molecules are free ofrepetitive elements. In preferred embodiments, the cell free DNAmolecules have sequences falling within different chromosomal regions inthe same table chosen from Tables 2-5.

In one embodiment, a method of diagnosing prostate cancer in anindividual is provided, comprising (a) determining the presence orabsence or the amounts of, at least 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, orat least 25, or of 30, 31, or 32, circulating cell-free DNA moleculesthe sequence of each falling within a different chromosomal region setforth in Table 2 (or having a nucleotide sequence of at least 25consecutive nucleotides in length that is a part of one of the sequencesset forth in Table A); and (b) correlating the presence of or anincreased amount of, one or more of the circulating cell-free DNAmolecules with an increased likelihood that the individual has prostatecancer or a recurrence of prostate cancer or a failure of treatment forprostate cancer.

In another embodiment, a method of diagnosing prostate cancer in anindividual is provided, comprising (a) determining the presence orabsence or the amounts of, at least 2, 3, 4, 5, 7, 8, 9, 10, or at least15, or of 20, 21, or 22, circulating cell-free DNA molecules thesequence of each falling within a different chromosomal region set forthin Table 3 (or having a nucleotide sequence of at least 25 consecutivenucleotides in length that is a part of one of the sequences set forthin Table B); and (b) correlating the presence of, or an increased amountof, one or more of the circulating cell-free DNA molecules with anincreased likelihood that the individual has prostate cancer or arecurrence of prostate cancer or a failure of treatment for prostatecancer.

In one embodiment, a method of diagnosing prostate cancer in anindividual is provided, comprising (a) determining the presence orabsence or the amounts of, at least 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 25,30, 35, 40, 45, or 50, or of 52, circulating cell-free DNA molecules thesequence of each falling within a different chromosomal region set forthin Table 4 (or having a nucleotide sequence of at least 25 consecutivenucleotides in length that is a part of one of the sequences set forthin Table C); and (b) correlating the presence of or an increased amountof, one or more of the circulating cell-free DNA molecules with anincreased likelihood that the individual has prostate cancer or arecurrence of prostate cancer or a failure of treatment for prostatecancer.

In one embodiment, a method of diagnosing/monitoring prostate cancer inan individual is provided, comprising (a) determining the presence orabsence or the amounts of, 1, 2, 3, or 4 circulating cell-free DNAmolecules the sequence of each falling within a different chromosomalregion set forth in Table 5 (or having a nucleotide sequence of at least25 consecutive nucleotides in length that is a part of one of thesequences set forth in Table D); and (b) correlating the presence of oran increased amount of, one or more of the circulating cell-free DNAmolecules with an increased likelihood that the individual does not haveprostate cancer or a recurrence of prostate cancer or a failure oftreatment for prostate cancer.

In yet another embodiment, the method of diagnosing, monitoring orscreening for prostate cancer in a patient, includes determining, in asample that is blood, serum or plasma from the patient, the presence orabsence or the total amount of, each and all circulating cell-free DNAseach having a sequence falling within the same one single chromosomalregion set forth in Table 2, Table 3, or Table 4; and correlating thepresence of one of said circulating cell-free DNAs or an increased totalamount of said circulating cell-free DNAs, with an increased likelihoodthat said patient has prostate cancer, or recurrence of prostate cancer.In other words, there can be any number of, and typically many,different circulating cell-free DNA molecules derived from one singlesame chromosomal region set forth in Table 2, Table 3, or Table 4, andall of such different circulating cell-free DNA molecules are detectedand/or the amount each or all is determined, and correlation with thestatus of prostate cancer is made.

In a specific embodiment, substantially all circulating cell-free DNAmolecules having a length of at least 20, 25, 30, 40, 50, 75 or 100consecutive nucleotides in length, or between 50 and 400 nucleotides inlength, are isolated from a blood, serum or plasma sample of a patient.The sequence of each of the circulating cell-free DNA molecules isdetermined, and compared with one or more of the sequences set forth inTables A-D to determine whether the sequence of a circulating cell-freeDNA falls within a chromosomal region set forth in Table 2, Table 3, orTable 4. If so, a diagnosis of prostate cancer is made. In the case of apatient treated with a therapy for prostate cancer, recurrence isindicated if a circulating cell-free DNA falls within a chromosomalregion set forth in Table 2, Table 3, or Table 4 is detected. Inpreferred embodiments, a diagnosis of prostate cancer or prostate cancertreatment failure or recurrence is indicated if two or more circulatingcell-free DNA molecules fall within 2, 3, 4 or more chromosomal regionsset forth in Tables 2-4, preferably all such chromosomal regions beingin the same table.

In another specific embodiment, substantially all circulating cell-freeDNA molecules having a length of at least 20, 25, 30, 40, 50, 75 or 100consecutive nucleotides in length, or between 50 and 400 nucleotides inlength, are isolated from a blood, serum or plasma sample of a patient.These circulating cell-free DNA molecules are hybridized to a microarraythat is described above in the context of the kit invention to determineif one of the circulating cell-free DNA molecules hybridizes to any oneof a plurality of oligonucleotide probes under stringent conditions.Each of the oligonucleotide probes has a nucleotide sequence identicalto a part of the sequence of a chromosomal region chosen from Tables 2-5or a sequence set forth in Tables A-D). Thus, if a circulating DNAmolecule hybridizes under stringent conditions to one of theoligonucleotide probes, it indicates that the circulating DNA moleculehas a nucleotide sequence falling within a chromosomal region set forthin Table 2, Table 3, or Table 4. If so, a diagnosis of prostate canceris made. In the case of a patient treated with a therapy for prostatecancer, recurrence is indicated if a circulating cell-free DNA fallswithin a chromosomal region set forth in Table 2, Table 3, or Table 4 isdetected. In preferred embodiments, a diagnosis of prostate cancer orprostate cancer treatment failure or recurrence is indicated if two ormore circulating cell-free DNA molecules fall within 2, 3, 4 or morechromosomal regions set forth in Tables 2-4, preferably all suchchromosomal regions being in the same table, e.g., Table 2, Table 3, orTable 4.

In the above various embodiments, preferably the circulating cell-freeDNA molecules have at least 25 consecutive nucleotides in length(preferably at least 50, 70, 80, 100, 120 or 200 consecutive nucleotidesin length). More preferably, the circulating cell-free DNA moleculeshave between about 50 and about 300 or 400, preferably from about 75 andabout 300 or 400, more preferably from about 100 to about 200consecutive nucleotides of a unique sequence within a chromosomal regionas set forth in Tables 2-5 (or of a unique sequence set forth in TablesA-D).

Detection of Circulating Nucleic Acids in the Blood

In order to detect the presence of circulating nucleic acids in theblood of patients that may have, or are suspected of having, prostatecancer, a blood sample is obtained from the patient. Serum or plasmafrom the blood sample is then analyzed for the presence of a circulatingcell-free DNA or biomarker as described herein. Nucleic acids can beisolated from serum or plasma using well known techniques, see, e.g.,the example sections. In the context of the current invention, thenucleic acid sequences that are analyzed are DNA sequences. Thus, inthis section, methods described as evaluating “nucleic acids” refers tothe evaluation of DNA.

Detection techniques for evaluating nucleic acids for the presence of abiomarker involve procedures well known in the field of moleculargenetics. Further, many of the methods involve amplification of nucleicacids. Ample guidance for performing is provided in the art. Exemplaryreferences include manuals such as PCR Technology: Principles andApplications for DNA Amplification (ed. H. A. Erlich, Freeman Press, NY,N.Y., 1992); PCR Protocols: A Guide to Methods and Applications (eds.Innis, et al., Academic Press, San Diego, Calif., 1990); CurrentProtocols in Molecular Biology, Ausubel, 1994-1999, includingsupplemental updates through April 2004; Sambrook & Russell, MolecularCloning, A Laboratory Manual (3rd Ed, 2001).

Although the methods may employ PCR steps, other amplification protocolsmay also be used. Suitable amplification methods include ligase chainreaction (see, e.g., Wu & Wallace, Genomics 4:560-569, 1988); stranddisplacement assay (see, e.g., Walker et al., Proc. Natl. Acad. Sci. USA89:392-396, 1992; U.S. Pat. No. 5,455,166); and severaltranscription-based amplification systems, including the methodsdescribed in U.S. Pat. Nos. 5,437,990; 5,409,818; and 5,399,491; thetranscription amplification system (TAS) (Kwoh et al., Proc. Natl. Acad.Sci. USA 86:1173-1177, 1989); and self-sustained sequence replication(3SR) (Guatelli et al., Proc. Natl. Acad. Sci. USA 87:1874-1878, 1990;WO 92/08800). Alternatively, methods that amplify the probe todetectable levels can be used, such as Qβ-replicase amplification(Kramer & Lizardi, Nature 339:401-402, 1989; Lomeli et al., Clin. Chem.35:1826-1831, 1989). A review of known amplification methods isprovided, for example, by Abramson and Myers in Current Opinion inBiotechnology 4:41-47, 1993.

In some embodiments, the detection of biomarker in the CNA of a patientis performed using oligonucleotide primers and/or probes to detect atarget sequence, wherein the target sequence is present in (e.g.,comprises some unambiguously assigned portion of) any of the chromosomalregions listed in Table 2, Table 3, Table 4, or Table 5).Oligonucleotides can be prepared by any suitable method, usuallychemical synthesis, and can also be purchased through commercialsources. Oligonucleotides can include modified phosphodiester linkages(e.g., phosphorothioate, methylphosphonates, phosphoamidate, orboranophosphate) or linkages other than a phosphorous acid derivativeinto an oligonucleotide may be used to prevent cleavage at a selectedsite. In addition, the use of 2′-amino modified sugars tends to favordisplacement over digestion of the oligonucleotide when hybridized to anucleic acid that is also the template for synthesis of a new nucleicacid strand.

In one embodiment, the biomarker is identified by hybridization undersequence-specific hybridization conditions with a probe that targets achromosomal region described herein. The probe used for this analysiscan be a long probe or sets for short oligonculeotide probes, e.g., fromabout 20 to about 150 nucleotides in length may be employed.

Suitable hybridization formats are well known in the art, including butnot limited to, solution phase, solid phase, oligonucleotide arrayformats, mixed phase, or in situ hybridization assays. In solution (orliquid) phase hybridizations, both the target nucleic acid and the probeor primers are free to interact in the reaction mixture. Techniques suchas real-time PCR systems have also been developed that permit analysis,e.g., quantification, of amplified products during a PCR reaction. Inthis type of reaction, hybridization with a specific oligonucleotideprobe occurs during the amplification program to identify the presenceof a target nucleic acid. Hybridization of oligonucleotide probes ensurethe highest specificity due to thermodynamically controlled two statetransition. Examples for this assay formats are fluorescence resonanceenergy transfer hybridization probes, molecular beacons, molecularscorpions, and exonuclease hybridization probes (e.g., reviewed inBustin, J. Mol. Endocrin. 25:169-93, 2000).

Suitable assay formats include array-based formats, described in greaterdetail below in the “Device” section, where probe is typicallyimmobilized. Alternatively, the target may be immobilized.

In a format where the target is immobilized, amplified target DNA isimmobilized on a solid support and the target complex is incubated withthe probe under suitable hybridization conditions, unhybridized probe isremoved by washing under suitably stringent conditions, and the solidsupport is monitored for the presence of bound probe. In formats wherethe probes are immobilized on a solid support, the target DNA istypically labeled, usually during amplification. The immobilized probeis incubated with the amplified target DNA under suitable hybridizationconditions, unhybridized target DNA is removed by washing under suitablystringent conditions, and the solid support/probe is monitored for thepresence of bound target DNA.

In typical embodiments, multiple probes are immobilized on a solidsupport and the target chromosomal regions in the CNA from a patient areanalyzed using the multiple probes simultaneously. Examples of nucleicacid arrays are described by WO 95/11995.

In an alternative probe-less method, amplified nucleic acidcorresponding to a target nucleic acid present in a chromosomal regionis performed using nucleic acid primers to the chromosomal region and isdetected by monitoring the increase in the total amount ofdouble-stranded DNA in the reaction mixture, is described, e.g., in U.S.Pat. No. 5,994,056; and European Patent Publication Nos. 487,218 and512,334. The detection of double-stranded target DNA relies on theincreased fluorescence various DNA-binding dyes, e.g., SYBR Green,exhibit when bound to double-stranded DNA.

As appreciated by one in the art, specific amplification methods can beperformed in reaction that employ multiple primers to target thechromosomal regions such that the biomarker can be adequately covered.

DNA Sequencing

In preferred embodiments, the presence of a sequence from a chromosomalregion set forth in Table 2, Table 3, Table 4 or Table 5 in the CNA froma patient undergoing evaluation is detected by direct sequencing. Suchsequencing, especially using the Roche 454, Illumina, and AppliedBiosystems sequencing systems mentioned below or similar advancedsequencing systems, can include quantitation (i.e., determining thelevel) of nucleic acids having a particular sequence. Such quantitationcan be used in the embodiments of the invention that involve determiningthe level of a biomarker (some embodiments of which involve correlatinga particular level to the presence or absence of cancer). Methodsinclude e.g., dideoxy sequencing-based methods although other methodssuch as Maxam and Gilbert sequencing are also known (see, e.g., Sambrookand Russell, supra). In typical embodiments, CNA from a patient issequenced using a large-scale sequencing method that provides theability to obtain sequence information from many reads. Such sequencingplatforms includes those commercialized by Roche 454 Life Sciences (GSsystems), Illumina (e.g., HiSeq, MiSeq) and Applied Biosystems (e.g.,SOLiD systems).

The Roche 454 Life Sciences sequencing platform involves using emulsionPCR and immobilizing DNA fragments onto bead. Incorporation ofnucleotides during synthesis is detected by measuring light that isgenerated when a nucleotide is incorporated.

The Illumina technology involves the attachment of randomly fragmentedgenomic DNA to a planar, optically transparent surface. Attached DNAfragments are extended and bridge amplified to create an ultra-highdensity sequencing flow cell with clusters containing copies of the sametemplate. These templates are sequenced using a sequencing-by-synthesistechnology that employs reversible terminators with removablefluorescent dyes.

Methods that employ sequencing by hybridization may also be used. Suchmethods, e.g., used in the ABI SOLiD4+ technology uses a pool of allpossible oligonucleotides of a fixed length, labeled according to thesequenced position. Oligonucleotides are annealed and ligated; thepreferential ligation by DNA ligase for matching sequences results in asignal informative of the nucleotide at that position.

The sequence can be determined using any other DNA sequencing methodincluding, e.g., methods that use semiconductor technology to detectnucleotides that are incorporated into an extended primer by measuringchanges in current that occur when a nucleotide is incorporated (see,e.g., U.S. Patent Application Publication Nos. 20090127589 and20100035252). Other techniques include direct label-free exonucleasesequencing in which nucleotides cleaved from the nucleic acid aredetected by passing through a nanopore (Oxford Nanopore) (Clark et al.,Nature Nanotechnology 4: 265-270, 2009); and Single Molecule Real Time(SMRT™) DNA sequencing technology (Pacific Biosciences), which is asequencing-by synthesis technique.

Devices and Kits

In a further aspect, the invention provides diagnostic devices and kitsuseful for identifying one or more prostate cancer-associated biomarkersin the CNA from a patient where the one or more biomarkers is a sequencecorresponding to any of the chromosomal regions set forth in Table 2,Table 3, Table 4, or Table 5. As will be apparent to skilled artisans,the kit of the present invention is useful in the above-discussed methodfor analyzing circulating cell-free DNA in a patient sample and indiagnosing, screening or monitoring prostate cancer as described above.

Thus, in one aspect, the present invention provides the use of at leastone oligonucleotide for the manufacture of a diagnostic kit useful indiagnosing, screening or monitoring prostate cancer. The nucleotidesequence of the oligonucleotide falls within a chromosomal region setforth in Table 2, Table 3, Table 4, or Table 5 (or matches a part of asequence set forth in Table A, Table B, Table C, or Table D).

Preferably, the kit of the present invention includes one, two or more(e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40 orat least 50, but preferably less than 111, preferably from one to about50, more preferably from 2 to about 50, or from 3 to about 50 sets ofoligonucleotides. Each set comprises one or more oligonucleotides (e.g.,from about one to about 10,000, preferably from 50, 100, 200 or 300 toabout 10,000). All of the nucleotide sequences of such one or moreoligonucleotides in each set fall within the same one single chromosomalregion that is set forth in Table 2, Table 3, Table 4, or Table 5 (ormatch a part of the same one single sequence set forth in Table A, TableB, Table C, or Table D). Each oligonucleotide should have from about 18to 100 nucleotides, or from 20 to about 50 nucleotides, and is capableof hybridizing, under stringent hybridization conditions, to thechromosomal region in which its sequence falls. The oligonucleotides areuseful as probes for detecting circulating cell-free DNA moleculesderived from the chromosomal regions. Preferably, each set includes asufficient number of oligonucleotides with sequences mapped to onechromosomal region such that any circulating cell-free DNA moleculesderived from the chromosomal region can be detected with theoligonucleotide set. Thus, the number of oligonucleotides required ineach set is determined by the total length of unique nucleotide sequenceof a particular chromosomal region, as will be apparent to skilledartisans. Such total lengths are indicated in Tables 2-5 and should alsobe apparent from Tables A-D.

Preferably, in the kit of the present invention, differentoligonucleotide sets correspond to different chromosomal regions withinthe same table. Preferably, the oligonucleotides are free of repetitiveelement. Optionally, the oligonucleotides are attached to one or moresolid substrates such as microchips and beads. In preferred embodiments,the kit is a microarray with the above oligonucleotides.

In one embodiment, the kit of the present invention includes a pluralityof oligonucleotide sets capable of hybridizing to the chromosomalregions set forth in Table 2, Table 3, Table 4, and Table 5,respectively. That is, the kit includes oligonucleotide probescorresponding to each and every chromosomal regions set forth in Tables2, 3, 4, and 5 (or matching each and every sequence set forth in TablesA, B, C, and D) such that all circulating cell-free DNA derived from anychromosomal region set forth in Tables 2-5 can be detected using thekit.

Use of the oligonucleotides included in the kit described for themanufacture of the kit useful for diagnosing, screening or monitoringprostate cancer is also contemplated. The manufacturing of such kitshould be apparent to a skilled artisan.

In some embodiments, a diagnostic device comprises probes to detect atleast 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 75, or 100, orall 110 chromosomal regions set forth in Tables 2-5. In someembodiments, the present invention provides probes attached to a solidsupport, such as an array slide or chip, e.g., as described in DNAMicroarrays: A Molecular Cloning Manual, 2003, Eds. Bowtell andSambrook, Cold Spring Harbor Laboratory Press. Construction of suchdevices are well known in the art, for example as described in U.S.Patents and Patent Publications U.S. Pat. No. 5,837,832; PCT applicationW095/11995; U.S. Pat. No. 5,807,522; U.S. Pat. Nos. 7,157,229,7,083,975, 6,444,175, 6,375,903, 6,315,958, 6,295,153, and 5,143,854,2007/0037274, 2007/0140906, 2004/0126757, 2004/0110212, 2004/0110211,2003/0143550, 2003/0003032, and 2002/0041420. Nucleic acid arrays arealso reviewed in the following references: Biotechnol Annu Rev 8:85-101(2002); Sosnowski et al, Psychiatr Genet 12(4):181-92 (December 2002);Heller, Annu Rev Biomed Eng 4: 129-53 (2002); Kolchinsky et al, Hum.Mutat 19(4):343-60 (April 2002); and McGail et al, Adv Biochem EngBiotechnol 77:21-42 (2002).

Any number of probes may be implemented in an array. A probe set thathybridizes to different, preferably unique, segments of a chromosomalregion may be used where the probe set detects any part of thechromosomal region. Alternatively, a single probe to a chromosomalregion may be immobilized to a solid surface. Polynucleotide probe canbe synthesized at designated areas (or synthesized separately and thenaffixed to designated areas) on a substrate, e.g., using alight-directed chemical process. Typical synthetic polynucleotides canbe about 15-200 nucleotides in length.

The kit can include multiple biomarker detection reagents, or one ormore biomarker detection reagents in combination with one or more othertypes of elements or components (e.g., other types of biochemicalreagents, containers, packages such as packaging intended for commercialsale, substrates to which biomarker detection reagents are attached,electronic hardware components, etc.). Accordingly, the presentinvention further provides biomarker detection kits and systems,including but not limited to arrays/microarrays of nucleic acidmolecules, and beads that contain one or more probes or other detectionreagents for detecting one or more biomarkers of the present invention.The kits can optionally include various electronic hardware components;for example, arrays (“DNA chips”) and microfluidic systems(“lab-on-a-chip” systems) provided by various manufacturers typicallycomprise hardware components. Other kits may not include electronichardware components, but may be comprised of, for example, one or morebiomarker detection reagents (along with, optionally, other biochemicalreagents) packaged in one or more containers.

Biomarker detection kits/systems may contain, for example, one or moreprobes, or sets of probes, that hybridize to a nucleic acid moleculepresent in a chromosomal region set forth in Tables 2-5.

A biomarker detection kit of the present invention may includecomponents that are used to prepare CNA from a blood sample from apatient for the subsequent amplification and/or detection of abiomarker.

Correlating the Presence of Biomarkers With Prostate Cancer

The present invention provides methods and reagents for detecting thepresence of a biomarker in CNA from a patient that has prostate canceror that is being evaluated to determine if the patient may have prostatecancer. In the context of the invention, “detection” or “identification”or “identifying the presence” or “detecting the presence” of a biomarkerassociated with prostate cancer in a CNA sample from a patient refers todetermining any level of the biomarker in the CNA of the patient wherethe level is greater than a threshold value that distinguishes betweenprostate cancer and non-prostate cancer CNA samples for a given assay.

In the current invention, for example, the presence of any one of thechromosomal regions (i.e., biomarkers) listed in Tables 2-4 isindicative of prostate cancer. As appreciated by one of skill in theart, biomarkers may be employed in analyzing a patient sample where thebiomarker has also been observed infrequently in a normal patient inorder to increase the sensitivity of the detection. For example, thebiomarkers indicated by bold font in Tables 2 and 3 have been observedto be present infrequently in CNA obtained from normal individuals;however, given the low frequency of occurrence in normal samplesrelative to the higher frequency of occurrence in prostate cancer, thepresence of the biomarker in a patient indicates that the patient has a95% or greater likelihood of having prostate cancer. Thus, for example,arrays used to detect the chromosomal regions can include those thatidentify the chromosomal regions in Tables 2 and 3 that are indicated inbold font.

The biomarkers set forth in Tables 2-4 are associated with prostatecancer, i.e., they are over-represented in prostate cancer patientscompared to individuals not diagnosed with prostate cancer. Thus, thedetection of one or more of the biomarkers set forth in Tables 2-4 isindicative of prostate cancer, i.e., the patient has an increasedprobability of having prostate cancer compared to a patient that doesnot have the biomarker. In some embodiments, the detection of two ormore biomarkers set forth in Tables 2-4 in the CNA of a patient isindicative of a greater probability for prostate cancer. As understoodin the art, other criteria, e.g., clinical criteria, etc., are alsoemployed to diagnose prostate cancer in the patient. Accordingly,patients that have a biomarker associated with prostate cancer alsoundergo other diagnostic procedures.

In some embodiments, one or more biomarkers that are under-representedin prostate cancer may be detected in the CNA of a patient. Thus, forexample, a biomarker listed in Table 5 may be detected in a CNA samplefrom a patient where the detection of the biomarker is indicative of anormal diagnosis, i.e., that the patient does not have prostate cancer.

“Over-represented” or “increased amount” means that the level of one ormore circulating cell-free DNAs is higher than normal levels. Generallythis means an increase in the level as compared to an index value.Conversely “under-represented” or “decreased amount” means that thelevel of one or more particular circulating cell-free DNA molecules islower than normal levels. Generally this means a decrease in the levelas compared to an index value.

In preferred embodiments, the test value representing the amount of aparticular circulating cell-free DNA is compared to one or morereference values (or index values), and optionally correlated toprostate cancer or cancer recurrence. Optionally, an increasedlikelihood of prostate cancer is indicated if the test value is greaterthan the reference value.

Those skilled in the art are familiar with various ways of deriving andusing index values. For example, the index value may represent the copynumber or concentration of a particular cell-free DNA according to thepresent invention in a blood sample from a patient of interest inhealthy state, in which case a copy number or concentration in a samplefrom the patient at a different time or state significantly higher(e.g., 1.5-fold, 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 20-fold,30-fold, 40-fold, 50-fold, 100-fold or more higher) than this indexvalue would indicate, e.g., prostate cancer or increased likelihood ofcancer recurrence. 2-fold in terms of distribution—it would be the Zvalue?? At least 95% value. Example sigma AUC significantly highersignificantly 2-fold above the distribution of normal or below thedistribution of normal

Alternatively, the index value may represent the average concentrationor copy number of a particular circulating cell-free DNA for a set ofindividuals from a diverse cancer population or a subset of thepopulation. For example, one may determine the average copy number orconcentration of a circulating cell-free DNA in a random sampling ofpatients with prostate cancer. Thus, patients having a copy number orconcentration (test value) comparable to or higher than, this valueidentified as having an increased likelihood of having prostate canceror prostate cancer recurrence than those having a test value lower thanthis value.

Often the copy number or amount (e.g., concentration) will be considered“increased” only if it differs significantly from the index value, e.g.,at least 1.5 fold difference in absolute or relative (e.g., as reflectedby hybridization signal) copy number or concentration.

A useful index value may represent the copy number or concentration of aparticular circulating cell-free DNA or of a combination (weighted orstraight addition) of two or more circulating cell-free DNAscorresponding to the same chromosomal region or different chromosomalregions. When two or more biomarkers or circulating cell-free DNAmolecules are used in the diagnosis/monitoring method, the presence orabsence of, or the amount of, each biomarker or circulating cell-freeDNA can be weighted and combined. Thus, a test value may be provided by(a) weighting the determined status or amount of each circulatingcell-free DNA molecule with a predefined coefficient, and (b) combiningthe weighted status or amount to provide a test value. The combiningstep can be either by straight addition or averaging (i.e., weightedequally) or by a different predefined coefficient.

The information obtained from the biomarker analysis may be stored in acomputer readable form. Such a computer system typically comprises majorsubsystems such as a central processor, a system memory (typically RAM),an input/output (I/O) controller, an external device such as a displayscreen via a display adapter, serial ports, a keyboard, a fixed diskdrive via a storage interface and a floppy disk drive operative toreceive a floppy disc, and a CD-ROM (or DVD-ROM) device operative toreceive a CD-ROM. Many other devices can be connected, such as a networkinterface connected via a serial port.

The computer system may also be linked to a network, comprising aplurality of computing devices linked via a data link, such as anEthernet cable (coax or 10BaseT), telephone line, ISDN line, wirelessnetwork, optical fiber, or other suitable signal transmission medium,whereby at least one network device (e.g., computer, disk array, etc.)comprises a pattern of magnetic domains (e.g., magnetic disk) and/orcharge domains (e.g., an array of DRAM cells) composing a bit patternencoding data acquired from an assay of the invention.

The computer system can comprise code for interpreting the results of astudy evaluating the presence of one or more of the biomarkers. Thus inan exemplary embodiment, the biomarker analysis results are provided toa computer where a central processor executes a computer program fordetermining the likelihood of a patient that has prostate cancer.

The invention also provides the use of a computer system, such as thatdescribed above, which comprises: (1) a computer; (2) a stored bitpattern encoding the biomarker testing results obtained by the methodsof the invention, which may be stored in the computer; (3) and,optionally, (4) a program for determining the likelihood of a patienthaving prostate cancer.

The invention further provides methods of generating a report based onthe detection of one or more biomarkers set forth in Tables 2-5.

Thus, the present invention provides systems related to the abovemethods of the invention. In one embodiment the invention provides asystem for analyzing circulating cell-free DNA, comprising: (1) a sampleanalyzer for executing the method of analyzing circulating cell-free DNAin a patient's blood, serum or plasma as described in the variousembodiments above (incorporated herein by reference); (2) a computersystem for automatically receiving and analyzing data obtained in step(1) to provide a test value representing the status (presence or absenceor amount, i.e., concentration or copy number) of one or morecirculating cell-free DNA molecules having a nucleotide sequence of atleast 25 nucleotides falling within a chromosomal region set forth inTable 2, Table 3, Table 4, or Table 5 (or having a nucleotide sequencethat is a part of one of the sequences set forth in Table A, Table B,Table C, or Table D), and optionally for comparing the test value to oneor more reference values each associated with a predetermined status ofprostate cancer. In some embodiments, the system further comprises adisplay module displaying the comparison between the test value and theone or more reference values, or displaying a result of the comparingstep.

Thus, as will be apparent to skilled artisans, the sample analyzer maybe, e.g., a sequencing machine (e.g., Illumina HiSeg™, Ion Torrent PGM,ABI SOLiD™ sequencer, PacBio RS, Helicos Heliscope™, etc.), a PCRmachine (e.g., ABI 7900, Fluidigm BioMark™, etc.), a microarrayinstrument, etc.

In one embodiment, the sample analyzer is a sequencing instrument, e.g.,a next-generation sequencing instrument such as Roche's GS systems,Illumina's HiSeq and MiSeq, and Life Technologies' SOLiD systems.Circulating cell-free DNA molecules are isolated from a patient's bloodor serum or plasma, and the sequences of all of the circulatingcell-free DNA molecules are obtained using the sample analyzer. Thesequencing instrument is used in sequencing the circulating cell-freeDNA molecules, and obtaining the sequences of these molecules. Acomputer system is then employed for automatically analyzing thesequences to determine the presence or absence or amount of acirculating cell-free DNA molecule having a nucleotide sequence of atleast 25 nucleotides falling within a chromosomal region set forth inTable 2, Table 3, Table 4, or Table 5 (or having a nucleotide sequencethat is a part of one of the sequences set forth in Table A, Table B,Table C, or Table D) in the sample. For example, the computer system maycompare the sequence of each circulating cell-free DNA molecule in thesample to each sequence in Tables A, B, C, and D to determine if thereis a match, i.e., if the sequence of a circulating cell-free DNAmolecule falls within a sequence in Table A, Table B, Table C, or Table2 or within a chromosomal region set forth in Table 2, Table 3, Table 4,or Table 5. The copy number of a particular circulating cell-free DNAmolecule is also automatically determined by the computer system.Optionally the computer system automatically correlates the sequenceanalysis result with a diagnosis regarding prostate cancer. For example,if one, two or more circulating cell-free DNA molecules are identifiedto be derived from a chromosomal region in Table 2, Table 3, or Table 4and preferably two or more circulating cell-free DNA molecules withsequences falling within different chromosomal regions from one singletable chosen from Tables 2-4, then the computer system automaticallycorrelates this analysis result with a diagnosis of prostate cancer.Optionally, the computer system further comprises a display moduledisplaying the results of sequence analysis and/or the result of thecorrelating step. The display module may be for example, a displayscreen, such as a computer monitor, TV monitor, or the touch screen, aprinter, and audio speakers.

The computer-based analysis function can be implemented in any suitablelanguage and/or browsers. For example, it may be implemented with Clanguage and preferably using object-oriented high-level programminglanguages such as Visual Basic, SmallTalk, C++, and the like. Theapplication can be written to suit environments such as the MicrosoftWindows™ environment including Windows™ 98, Windows™ 2000, Windows™ NT,and the like. In addition, the application can also be written for theMacIntosh™, SUN™, UNIX or LINUX environment. In addition, the functionalsteps can also be implemented using a universal or platform-independentprogramming language. Examples of such multi-platform programminglanguages include, but are not limited to, hypertext markup language(HTML), JAVA™, JavaScript™, Flash programming language, common gatewayinterface/structured query language (CGI/SQL), practical extractionreport language (PERL), AppleScript™ and other system script languages,programming language/structured query language (PL/SQL), and the like.Java™—or JavaScript™—enabled browsers such as HotJava™, Microsoft™Explorer™, or Netscape™ can be used. When active content web pages areused, they may include Java™ applets or ActiveX™ controls or otheractive content technologies.

The analysis function can also be embodied in computer program productsand used in the systems described above or other computer- orinternet-based systems. Accordingly, another aspect of the presentinvention relates to a computer program product comprising acomputer-usable medium having computer-readable program codes orinstructions embodied thereon for enabling a processor to carry out theanalysis and correlating functions as described above. These computerprogram instructions may be loaded onto a computer or other programmableapparatus to produce a machine, such that the instructions which executeon the computer or other programmable apparatus create means forimplementing the functions or steps described above. These computerprogram instructions may also be stored in a computer-readable memory ormedium that can direct a computer or other programmable apparatus tofunction in a particular manner, such that the instructions stored inthe computer-readable memory or medium produce an article of manufactureincluding instruction means which implement the analysis. The computerprogram instructions may also be loaded onto a computer or otherprogrammable apparatus to cause a series of operational steps to beperformed on the computer or other programmable apparatus to produce acomputer implemented process such that the instructions which execute onthe computer or other programmable apparatus provide steps forimplementing the functions or steps described above.

The following examples are provided by way of illustration only and notby way of limitation. Those of skill in the art will readily recognize avariety of non-critical parameters that could be changed or modified toyield essentially similar results.

EXAMPLES Example 1 Identification of Prostate Cancer-Associate CNASequencing of CNA:

Sera from 197 patients having biopsy-proven prostate cancer with aGleasson Score of 3 to 10 were used. The data were compared to samplesobtained from 92 male apparently healthy blood donors (normals).

After extraction of DNA from serum or plasma, using standard silicabased methods, a whole genome amplification was performed in duplicate.The products were pooled and used for further analysis.

Long sequencing runs:

The primer sequences for 454 sequencing were added to the product usingfusion primers in not more than 20 cycles of PCR. The resulting productwas treated according to the 454 sequencer manual and used for directsequence detection.

Computational analysis:

The sequence reads were appointed to the sample source by reading theidentifier sequence string and all non source parts were cut out (e.g.primers).

The origin of the circulating DNA was investigated by local alignmentanalyses using the BLAST program using high stringent parameters (30).Repetitive elements were detected and masked by using a local install ofthe Repeatmasker software package (31) using the repbase (version 12.09)that was obtained from the Genetic Information Research Institute (32).After masking of the repetitive elements and region of low sequencecomplexity each sequence was subjected to sequential BLAST analysesquerying databases of bacterial, viral and fungal genomes and the humangenome (reference genome build 37.1). Bacterial, viral, fungal and humangenomes were obtained from the National Center for BiotechnologyInformation (NCBI, (ftp.ncbi.nih.gov)). After each of the sequentialdatabase searches all parts of a queried sequence that producedsignificant hits (e<0.0001) were masked and masked sequences weresubsequently used to query the next database. Masked nucleotides werecounted and subtracted from the total nucleotide counts resulting in theamounts of unidentified nucleotides.

For each query fragment and each database search, the highest scoringBLAST hit with a length of more than 50% of the query sequence wasrecorded in a SQL database. The highest scoring BLAST hit was defined asthe longest hit with the highest percent identity (maximum of hitlength×identity). For each of the sequences, the start and stoppositions for query and database were recorded.

Disease association analysis:

To investigate which sequences/sequence tags are disease associated, allalignments that could be unambiguously placed on the human genomicdatabase (Homo sapiens Build 37.1 http://www.ncbi.nlm.nih.gov/mapview/)were used. Those were categorized into 4060 region of 750,000 byintervals and selected for differences between normal controls andcancer patients on the basis of a group comparison using thenon-parametric median test. Five hundred chromosomal regions having thelowest p-values were used for further analyses.

In 347 rounds of bootstrap random selected samples linear multivariateregressions modeling was performed using the above selected 750k regionsas independent parameters. The 347 rounds of bootstrap yielded 8.9×10⁷models. From these, the best 42 independent variables (750 k pieces,based on ΣAlCω_(i)) were chosen and are shown in Table 1.

Selection of Clusters in the Pre-Selected Regions

As a second step, a cluster search within the 42 regions was performedand clusters of cancer samples were found that did not contain anynormal samples. Additional clusters were searched where only one orfewer normal samples were found in a cluster of at least 10 cancersamples. The cluster regions were restricted to those that had adistance to the next normal hit of at least 200 bp. Given these limits,88 clusters were found, of which 25 contained one hit in the normalgroup. A sample is called positive if at least one hit is found in anyof the cluster regions (see Table 2). A selection of 30 clusters withoutfalse positives covering 441 kbp was sufficient to provide a 94%sensitivity and 100% specificity, which can be increased to 96% when twomore clusters with a hit in one normal was included, achieving aspecificity of 99%.

Selection of Genomic Clusters

A search for genomic clusters was also performed over the wholenon-repetitive genome. The limit was set to have more than 18independent patients hits per cluster, where over 600 clusters with notmore than 1 hit in samples from normals was found. A selection of 22clusters covering about 475 kbp was sufficient to give a rate of 99.5%true positives at a 99% true negative level (see Table 3). Excludingthose two regions that had one hit in the group of normal, non-prostatecancer patients, the true positive rate was 96% and consequently 100%specificity.

Example 2 Identification of Additional Biomarkers That are Over- orUnder-Represented in Prostate Cancer

In a further analysis to identify biomarkers, a subset of 77 ProstateCancer samples and 70 Controls were sequenced on a Life TechnologiesSOLiD4+ System, which is a high-throughput sequencer that typically canrun hundreds of millions of sequences of lengths of about 30 to about 75bases in parallel. (Another example of such a sequencing technology isthe Genome Analyzer (Illumina)). Briefly, samples were prepared asdescribed for the 454 sequencing, where adaptors specific to SOLiDsequencing are used. For each sample about 5 million reads of 40 by wereachieved and aligned to the human genome (Build Hg18) using theLifetechnologies “Bioscope” software suite with default stringencyvalues. All uniquely aligned reads were used for further analysis.

In order to define the chromosomal regions, where samples of eithergroup cluster in comparison the second group, all hits are ranked inorder of their appearance on the chromosomes. Biomarker regions wereinitially defined as those regions where at least 15 hits covering atleast 10 samples of a group were counted, but not more than one sampleof the other group demonstrated a hit.

In 1000 round of random resampling the samples were divided into atraining and validation set of 50% of each group. For each region, asample in the region was assigned a score of 1 if the cluster was aprostate cancer (PrCa) cluster and a score of −1 for region where normalsamples clustered. The average AUC for the validation set was found tobe 0.92 if the score sums of 42 cluster regions are used. The regionsused in the 1000 round of re-sampling were ranked according to thenumber of round in which either region participated. Tables 4 and 5shows the 56 highest ranking regions. Table 4 summarizes those clustersidentified in this analysis that were overrepresented in prostate cancerserum samples. Table 5 summarizes clusters identified in this analysisthat were underrepresented in prostate cancer serum samples; the ROCcurve using the 42 highest ranking regions is shown in FIG. 1. The AUCbelow the ROC curve was calculated as 0.96 (0.886-0.996) with andaccuracy of 92% (86%-97%).

All patents, patent applications, and other published referencematerials cited in this specification are hereby incorporated herein byreference in their entirety for their disclosures of the subject matterin whose connection they are cited herein.

Lengthy table referenced here US20160002739A1-20160107-T00001 Pleaserefer to the end of the specification for access instructions.

Lengthy table referenced here US20160002739A1-20160107-T00002 Pleaserefer to the end of the specification for access instructions.

Lengthy table referenced here US20160002739A1-20160107-T00003 Pleaserefer to the end of the specification for access instructions.

Lengthy table referenced here US20160002739A1-20160107-T00004 Pleaserefer to the end of the specification for access instructions.

Lengthy table referenced here US20160002739A1-20160107-T00005 Pleaserefer to the end of the specification for access instructions.

Lengthy table referenced here US20160002739A1-20160107-T00006 Pleaserefer to the end of the specification for access instructions.

Lengthy table referenced here US20160002739A1-20160107-T00007 Pleaserefer to the end of the specification for access instructions.

Lengthy table referenced here US20160002739A1-20160107-T00008 Pleaserefer to the end of the specification for access instructions.

Lengthy table referenced here US20160002739A1-20160107-T00009 Pleaserefer to the end of the specification for access instructions.

LENGTHY TABLES The patent application contains a lengthy table section.A copy of the table is available in electronic form from the USPTO website(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20160002739A1).An electronic copy of the table will also be available from the USPTOupon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

What is claimed is:
 1. A method of analyzing a sample from a patient forcell-free DNA associated with prostate cancer to calculate a numericindex of tumor-associated copy number variability , the methodcomprising: isolating cell-free DNA from a blood, serum, or plasmasample from the patient; sequencing cell-free DNA molecules; for each ofa plurality of circulating cell-free DNA molecules, unambiguouslyassigning cell-free DNA sequences to a genomic region; determining thenormalized copy numbers of the plurality of cell-free DNA molecules;comparing the normalized copy numbers to a reference value; summing allregions that have a normalized copy number greater than that of 95% ofthe reference value; and summing all regions that have a normalized copynumber less than 5% of the reference value, to obtain a numeric index oftumor-associate copy number variability.
 2. The method of claim 1,wherein the reference value is derived from control subjects that areknown to be cancer-free.
 3. The method of claim 1, wherein the referencevalue is derived from the patient, by comparing each region to allregions.
 4. The method of claim 1, wherein the index value is comparedto the index value in cancer-free control subjects, where an indexof >95% of that reference in indicative of cancer.
 5. A method ofdiagnosing or screening for prostate cancer in a patient, comprising:determining, in a sample that is blood, serum or plasma from saidpatient, the presence or absence or the amount of, a circulatingcell-free DNA having a sequence falling within a chromosomal region setforth in a table selected from the group consisting of Table 2, Table 3,and Table 4; and correlating the presence of or an increased amount ofsaid circulating cell-free DNA with an increased likelihood that saidpatient has prostate cancer.